a soft segment modeling approach for duration modeling in phoneme recognition systems

نویسندگان

farbod razazi

abolghasem sayadiyan

چکیده

the geometric distribution of states duration is one of the main performance limiting assumptions of hidden markov modeling of speech signals. stochastic segment models, generally, and segmental hmm, specifically, overcome this deficiency partly at the cost of more complexity in both training and recognition phases. in this paper, a new duration modeling approach is presented. the main idea of the model is to consider the effect of adjacent segments on the probability density function estimation and evaluation of each acoustic segment. this idea not only makes the model robust against segmentation errors, but also it models gradual change from one segment to the next one with a minimum set of parameters. the proposed idea is analytically formulated and tested on a timit based context independent phoneme classification system. during the test procedure, the phoneme classification of different phoneme classes was performed by applying various proposed recognition algorithms. the system was optimized and the results have been compared with a continuous density hidden markov model (cdhmm) with similar computational complexity. the results show slight improvement in phoneme recognition rate in comparison with standard continuous density hidden markov model. this indicates improved compatibility of the proposed model with the speech nature.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Near-miss Modeling: a Segment-based Approach to Speech Recognition Near-miss Modeling: a Segment-based Approach to Speech Recognition

Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance. In contrast, segment-based approaches represent speech as a temporal graph of feature vectors a...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Evaluation of Soft Segment Modeling on a Context Independent Phoneme Classification System

ةــصلاخلا : ُـي ةيساسلأا تاضارتفلاا نم ةلاحلا دادتملا يسدنهلا عيزوتلا ربتع ةراشلأل فوآرام ةجذمن ءادأ نم د ُّ حت يتلا ةيتوص لا . ة يعباتتلا ءاز جلأا جذو منأ نإ ف ،مو معلا ى لعو – تا يئزج كلذ آو ،ةيئاوش علا HMM ى لعو ، يف ةبوعص ةجرد يف ةدا يز ى لإ هرود ب يدؤ ي اً يئزج صقن لا اذ ه زواجتل ،صوصخلا روطلا د يدحتو بيرد ت . جذو من نمض توصلا تايئاصحلال يجيرد تلا ي نمزلا ر يغتلا جرد ن م ل ،ضار تفلاا اذ ه ى ل...

متن کامل

Near-miss modeling: a segment-based approach to speech recognition

Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance. In contrast, segment-based approaches represent speech as a temporal graph of feature vectors a...

متن کامل

Modeling duration patterns for speaker recognition

We present a method for speaker recognition that uses the duration patterns of speech units to aid speaker classification. The approach represents each word and/or phone by a feature vector comprised of either the durations of the individual phones making up the word, or the HMM states making up the phone. We model the vectors using mixtures of Gaussians. The speaker specific models are obtaine...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید


عنوان ژورنال:
the modares journal of electrical engineering

ناشر: tarbiat modares university

ISSN 2228-527 X

دوره 4

شماره 1 2004

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023